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DIMENSIONALITY REDUCTION 


= [he number of input features, variables, or columns present іп a given 
dataset is known as dimensionality, and the process to reduce these 
features is called dimensionality reduction. 
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WHY DIMENSIONALITY REDUCTION? 


= A dataset contains a huge number of input features in various cases, 
which makes the predictive modeling task more complicated. Because it 
is very difficult to visualize or make predictions for the training dataset 
with a high number of features, for such cases, dimensionality reduction 
techniques are required to use. 


е Dimensionality reduction technique can be defined as, "It is a way of 
converting the higher dimensions dataset into lesser dimensions 
dataset ensuring that it provides similar information." These 
techniques are widely used in machine learning for obtaining a better fit 
predictive model while solving the classification and regression problems. 
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DIMENSIONALIT Y 


Perfomance 


Number of Features/Dimensions 


Optimal number of features 
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WHY DIMENSIONALITY REDUCTION? 


= Visualization: projection of high-dimensional data onto 2D or 3D. 


= Data compression: efficient storage and retrieval. 


= Noise removal: positive effect on query accuracy. 
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DOCUMENT CLASSIFICATION 
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=» Task: To classify unlabeled 
documents into categories 


Challenge: thousands of terms 


Solution: to apply 
dimensionality reduction 


Machine Learning 


DIMENSIONALITY REDUCTION 


= |t is commonly used in the fields that deal with high-dimensional 
data, such as speech recognition, signal processing, 
bioinformatics, etc. It can also be used for data visualization, 
noise reduction, cluster analysis, etc. 
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Dimensionality reduction 
Techniques 
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THE CURSE OF DIMENSIONALIT Y 


= Handling the high-dimensional data is very difficult in practice, 
commonly known as the curse of dimensionality. lf the 
dimensionality of the input dataset increases, any machine learning 
algorithm and model becomes more complex. As the number of 
features increases, the number of samples also gets increased 
proportionally, and the chance of overfitting also increases. If the 
machine learning model is trained on high-dimensional data, it 
becomes overfitted and results in poor performance. 
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BENEFITS OF APPLYING DIMENSIONALI TY REDUCTION 


Some benefits of applying dimensionality reduction technique to the given dataset are 
given below: 


= By reducing the dimensions of the features, the space required to store the dataset 
also gets reduced. 


= Less Computation training time is required for reduced dimensions of features. 
= Reduced dimensions of features of the dataset help in visualizing the data quickly. 


= |t removes the redundant features (if present) by taking care of multicollinearity. 
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DISADVANTAGES OF DIMENSIONALI TY REDUCTION 


There are also some disadvantages of applying the dimensionality reduction, which are 
given below: 


= Some data may be lost due to dimensionality reduction. 


= |n the РСА dimensionality reduction technique, sometimes the principal components 
required to consider are unknown. 
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APPROACHES OF DIMENSION REDUCTION 


= Feature Selection 


= Feature Extraction 
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FEATURE SELECTION 


Feature selection is the process of selecting the subset of the relevant 
features and leaving out the irrelevant features present in a dataset to build 
a model of high accuracy. In other words, it is a way of selecting the 
optimal features from the input dataset. 
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FEATURE SELECTION - METHODS 


= Filters Methods 


= Wrappers Methods 
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FILTERS METHODS 


In this method, the dataset is filtered, and a subset that contains only the relevant 
features is taken. Some common techniques of filters method are: 


= Correlation 
= Chi-Square Test 
= ANOVA 
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WRAPPERS METHODS 


The wrapper method has the same goal as the filter method, but it takes a machine learning 
model for its evaluation. In this method, some features are fed to the ML model, and evaluate 
the performance. The performance decides whether to add those features or remove to 
increase the accuracy of the model. This method is more accurate than the filtering method but 
complex to work. Some common techniques of wrapper methods are: 


= Forward Selection 


= Backward Selection 


= Bi-directional Elimination 
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FEATURE EXTRACTION 


Feature extraction is the process of transforming the space containing many dimensions 
into space with fewer dimensions. This approach is useful when we want to keep the 
whole information but use fewer resources while processing the information. 


Some common feature extraction techniques are: 
1. Principal Component Analysis 

2. Linear Discriminant Analysis 

3. Kernel PCA 


4. Quadratic Discriminant Analysis 


DR. HAIDER ALI Machine Learning 18 


THANK YOU 
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